Add toByteArray(InputStream input, int size, int bufferSize)#776
Add toByteArray(InputStream input, int size, int bufferSize)#776garydgregory merged 20 commits intomasterfrom
Conversation
This introduces `toByteArray(InputStream input, int size, int bufferSize)`, which reads the stream in chunks of `bufferSize` instead of allocating the full array up front. By reading incrementally, the method: * Validates that the stream actually contains `size` bytes before completing the allocation. * Prevents excessive memory usage if a corrupted or malicious `size` value is provided. * Offers safer handling for untrusted input compared to the direct-allocation variant.
garydgregory
left a comment
There was a problem hiding this comment.
Hi @ppkarwasz
I have lots of comments! 😉
|
I removed a lot of details from the Javadoc and kept only these:
I believe that these behaviors will not change and should remain in the method contracts, what do you think? |
garydgregory
left a comment
There was a problem hiding this comment.
Hi @ppkarwasz ,
I have comments scattered about.
TY!
* Extends incremental (chunked) reading to all `toByteArray` variants when the requested size is unknown or exceeds 128 KiB. * The 128 KiB threshold matches the default buffer size used in CPython. * Updates Javadoc to emphasize that memory usage grows **proportionally** with the number of bytes actually **read**, making these methods suitable for large streams when sufficient memory is available.
|
Hi @garydgregory, I’ve refactored this proposal a bit further:
Open questions:
|
CPython is irrelevant IMO. Java's
Where do you see that? |
garydgregory
left a comment
There was a problem hiding this comment.
Hello @ppkarwasz
I added some comments.
In the source code of |
toByteArray methodThe implementation of `IOUtils.toByteArray(InputStream, int, int)` added in #776 throws different exceptions depending on the requested size: * For request sizes larger than the internal chunk size, it correctly throws an `EOFException`. * For smaller requests, it incorrectly throws a generic `IOException`. This PR makes the behavior consistent by always throwing an `EOFException` when the stream ends prematurely. Note: This also affects `RandomAccessFiles.read`. Its previous truncation behavior was undocumented and inconsistent with `RandomAccessFile.read` (which reads as much as possible). The new behavior is not explicitly documented here either, since it is unclear whether throwing on truncation is actually desirable.
The implementation of `IOUtils.toByteArray(InputStream, int, int)` added in #776 throws different exceptions depending on the requested size: * For request sizes larger than the internal chunk size, it correctly throws an `EOFException`. * For smaller requests, it incorrectly throws a generic `IOException`. This PR makes the behavior consistent by always throwing an `EOFException` when the stream ends prematurely. Note: This also affects `RandomAccessFiles.read`. Its previous truncation behavior was undocumented and inconsistent with `RandomAccessFile.read` (which reads as much as possible). The new behavior is not explicitly documented here either, since it is unclear whether throwing on truncation is actually desirable.
|
As far as I can tell from Jenkins agents failed to connect with SSH until we reverted Apache Commons IO 2.21.0 and returned to Apache Commons IO 2.20.0. We released Jenkins 2.538 less than 24 hours after 2.537 due to the severity of the issue. The stack trace reported in the failure is: I've attached a tar archive of the git repository where I've been able to duplicate the issue. |
|
Hi @MarkEWaite, I can’t reproduce it locally, but the stack trace points to @Override
public int read(byte[] b, int off, int len) throws IOException {
int r = SFTPClient.this.read(h, offset, b, off, len);
if (r < 0) return -1;
offset += r;
return r;
}This method doesn’t fully follow the Given that This looks like an issue in Trilead rather than Commons IO. I submitted a minimal fix in jenkinsci/trilead-ssh2#273 |
More details of the bug are available in: * jenkinsci/jenkins#11314 * jenkinsci/jenkins#11312 * jenkinsci/trilead-ssh2#273 Details of the changes in the library are available in the library release notes: * https://github.com/jenkinsci/trilead-ssh2/releases/tag/build-217-jenkins-371.vc1d30dc5a_b_32 Testing done: * apache/commons-io#776 (comment) includes the testing configuration I used to confirm that 2.537 fails to start SSH build agents before this change and starts SSH build agents successfully after this change Testing to be done: * Confirm that incremental build of the plugin passes BOM testing
More details of the bug are available in: * jenkinsci/jenkins#11314 * jenkinsci/jenkins#11312 * jenkinsci/trilead-ssh2#273 Details of the changes in the library are available in the library release notes: * https://github.com/jenkinsci/trilead-ssh2/releases/tag/build-217-jenkins-371.vc1d30dc5a_b_32 Testing done: * apache/commons-io#776 (comment) includes the testing configuration I used to confirm that 2.537 fails to start SSH build agents before this change and starts SSH build agents successfully after this change Testing to be done: * Confirm that incremental build of the plugin passes BOM testing
This introduces
toByteArray(InputStream input, int size, int bufferSize), which reads the stream in chunks ofbufferSizeinstead of allocating the full array up front.By reading incrementally, the method:
sizebytes before completing the allocation.sizevalue is provided.mvn; that'smvnon the command line by itself.